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ABSTRACT 



An apparatus and method that speeds the processing of data 
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with the present invention, a vector zero overhead loop with 
parallel issue processes multiple data elements at the same 
time, and yet is programmed with readable assembly lan- 
guage and requires neither vector registers nor a lot of extra 
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ACCELERATING VECTOR PROCESSING speeds the processing of data vectors using a zero overhead 

USING PLURAL SEQUENCERS TO loop with parallel issue and post-address-modifying loads 

PROCESS MULTIPLE LOOP ITERATIONS and stores that processes multiple data elements at the same 

SIMULTANEOUSLY time, and yet is programmed with readable assembly lan- 

BACKGROUND OF THE INVENTION 5 and d ° CS n0t require 8 lot ° f eXtra regiS,CrS 

1 Field of the Invention ' n accorc * ance tne P resent invention, the loop instruc- 

' , . i * j * • tions are formed as producer-consumer instructions, i.e., the 

The present invention relates generally to data processing, c . . K. . . , , , . . .. K . Rjf 1 

* i i . i * « t c • result of instruction Mis used only by instruction M or M+l, 

and more particularly to a method and apparatus for increas- , , , 

j c ' j * * • * * * * i • , 10 and the results are stored into different registers. A compiler 

mg the speed of processing data vectors in a digital signal JU , , , * * & . *. 

processor or microprocessor without requiring vector regis- or ^bler detects the producer-consumer loops, reassigns 

ters or a large number of registers. registers to meet the different result criteria, and encodes the 

2. Description of the Related Art * er0 ov * rh f ad loo P as , a vec , tor zer0 ov / rhead lo0 ?' 

A , . . , ™ - Since the loop analysis is done in software, there is no 

A data vector is a series of data elements. The concept of 1C ..... , . 1 . , , ... A1 . , 

. , • . 15 additional hardware required to detect it. Also, since general 

vector processing has been incorporated into computing . n , , . . * 

systems to provide high computational throughput for many P ur ? ose "Pf 15 are lh " e B . no need for vect ° r 

applications by performing the same series of operations on re S ls,ers - Furthermore, since only register assignments and 

each data element or pairs of data elements. the » ro f er , h , ead loo P instruction are changed to a vector 

rr . t , j . r zero overhead loop instruction, the readability of the assem- 

Typically, the vector processing loops required to perform 2 n CO( j e j s maintained 

the same series of operations on each data element or pairs y 

of data elements dominate the amount of time required to These and other advantages and features of the invention 

process signal processing kernels. The time required to will become apparent from the following detailed descrip- 

perform these vector processing loops has been decreased in tion of the invention which is provided with the accompa- 

a number of ways utilizing both hardware and software. For 2 s nying drawings, 
example, software techniques include unrolling the loops, 

using parallel issue including reordering the instructions, BRIEF DESCRIPTION OF THE DRAWINGS 
and software pipelining. In hardware, zero overhead 

looping, parallel execution (both superscalar and instruction FIG. 1 illustrates an apparatus which enables the vector 

indicated), post-address-modifying loads and stores, vector 30 processing in accordance with the present invention; 

units and vector registers, and Very Long Instruction Word „ .„ r . ^ 

(VLIW) instructions that do several of the required opera- . 2 trates in block diagram form a path for the 

tions in parallel have been implemented. Although these instructions [hit implements the vdo processmg in accor- 

methods increase the speed of the vector processing, they dance tne P resent mention; 

either require extra code, make the required assembly code 35 FIG. 3 illustrates in flow chart form the steps of a vdo 

hard to read and understand, or require extra registers that sequencer in accordance with the present invention; 

are not used except for these vector operations. . 4 .„ , , . , , 

rt , , .. „ . , FIG. 4A illustrates an example of a code sequence includ- 

One approach to exploding the land of parallelism inter- m a muhi , e ^ kel wilh tbe ^ ndin lo 

cm in vector processing 1S through he use of dynamic mma ( , } va , ue and mi for eadj lm<J ^ 

scheduhng^ Several dynamic scheduling techniques are 40 accordance wi th the present invention; and 

known in the art, including superscalar, scoreboarding, and 

reservation stations. Reservation stations, in particular, FIG. 4B illustrates in table format the values of the 

address the problem of executing multiple iterations of a registers in each sequencer during execution of the code 

loop without changing the source code. Reservation stations from FIG. 4A in accordance with the present invention, 

work by eliminating false dependencies between the instruc- 45 

tions of different loop iterations. When the instructions of a DETAILED DESCRIPTION OF THE 

particular iteration are executed by a sequential issue PREFERRED EMBODIMENTS 
machine, dependencies between the instructions within the 

iteration may block issuing of instructions in the next The present invention will be described as set forth in the 

iteration, even though there are sufficient hardware 50 preferred embodiment illustrated in FIGS. 1—4. Other 

resources and no dependencies between the current iteration embodiments may be utilized and structural, logical or 

and the next. Reservation stations allow an instruction to be programming changes may be made without departing from 

issued and buffered at a functional unit for later execution. the spirit or scope of the present invention. 

This frees the issue pipeline to process additional instruc- , . ... ... , - 

j 1 • * u * ■* 1 t .u * Id accordance with the present invention, the speed of 

tions and begin the next iteration before the current one is 55 , . . ... . r 

a - u j r» *• * u jr.- , processing data vectors is increased utilizing a vector zero 

finished. Reservations stations, however, require additional h , - c ,11 

hardware, are extremely complex, and make the execution overhead loop m place of a zero overhead loop. For the 

time of the loop non-deterministic. purposes of this discussion an instruction is denned as a 

rn , „ , , , packet of one or more instructions that can be issued in 

rhus, there exists a need for an apparatus and method that ^ whjch als0 be referred , Q m a mulli . issue 



packet. 



increases the speed of processing of data vectors by pro- 60 
cessing multiple data elements at the same time which is 
programmed with readable assembly language and does not The invention operates by starting loop iteration N+l 
require a lot of extra registers. (N+2, etc.) before iteration N is finished. For example, 

below is a representative loop in a pseudo code assembly 
65 language that multiplies a vector and a scalar and puts the 



SUMMARY OF THE INVENTION 



The present invention overcomes the problems associated result into a second vector utilizing a conventional zero 
with the prior art and provides an apparatus and method that overhead loop, i.e., a do instruction: 
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results from a functional unit 12. An instruction decode and 
dispatch unit 20 is provided to perform instruction selection 

and routing functions, and is connected to each functional 

^° S4 ! d0 inst ™ tions bctw «* { and > 4 ^ unit 12 and the register bypass network 14 via line 22 to 

id.post r2, r3, $4 ! load clement pointed to by r3 into r2 5 route instructions under either normal operation or operation 

I add 4 to r3 (post increment) in accordance with the present invention. 

mul r2, rO, r2 ! mult^y rOtimes r2 , n accordance with thc prcsent inveQt ion, a Vdo control 

sLposi r2, r5, $4 ! stored ^memory pointed to by r5 unit 30, which comprises a first instruction sequencer SEQO 

! add 4 to r5 32 and a second instruction sequencer SEQ1 34, a loop 

} 10 buffer 40, a fetch unit 42 for sending addresses to and getting 

instructions back from a memory (not shown), an iteration 

The conventional zero overhead loop above specifies four counter 60 - a nurnber of iterati j ons ("utm iter) register 61 

iterations, with three cycles in each iteration. Before itera- and a " um A b , e ' of instructions (num_instr) register 62 are 

tion N+l can begin, iteration N must be complete. For c provided. Although only two mstruction sequencers 32, 34 

example, before the second iteration can begin, the first 35 are shown, the invention is not so limited, and any number 

iteration must be complete, before the third iteration can of instruction sequencers greater than two may be used, 

begin, the second iteration must be complete, and so forth. Iteratl0n counter 60 slores the number of iterations that have 

This means that a total of twelve cycles (three per iteration) been processed in whole or in part by instruction sequencers 

is required to complete the loop. In accordance with the nn 32 > 34 ' ^ number of Orations (num_iter) register 61 is 

present invention, the processing speed of the loop is 20 ada P tcd 10 store a value representing a total number of 

increased by starting loop iteration N+l (N+2, etc.) before iterations of said program loop performed by instruction 

iteration N (N+l, etc.) is finished utilizing a vector zero sequencers 32, 34. The number of instructions (num_instr) 

overhead loop. For a loop to be processed in this manner, its agister 62 is adapted to store a value representing a total 

instructions must be producer-consumer instructions, i.e., n< number of instructions of said program loop performed by 

the result of instruction M must be used only by instruction 25 instruction sequencers 32, 34. 

M or M+l. In addition, the results must be stored in different Fetch unit 42, loop buffer 40 and Vdo control unit 30 are 

registers. connected to instruction decode and dispatch unit 20 via line 

Thus, the zero overhead loop above may be changed to a 23 - Stale machines (not shown), as are known in the art, in 

vector zero overhead loop, i.e., a vdo instruction, as follows: 10 each sequencer 32, 34 implement functional unit allocation, 

control of loop initialization, and the start of each iteration 
in accordance with the present invention. Functional unit 

allocation is used to give priority to preceding iterations over 

vdo $4 ! do instructions between {and} 4 limes succeeding iterations, for example to iteration N, then N+l, 

! can start the next iteration before the N+2) etc . ^ sequencers 32, 34 are used to execute the 

! current iteration completes. 35 . . . . . 

I r instructions in the loop. 

id.post r2, r3, $4 ! load element pointed to by r3 into r2 The implementation of the vdo capability in accordance 

! add 4 to r3 w j m t he present invention relics on the loop buffer 40 having 

mul r6, rO, r2 ! multiply rO times r2 u- 1 j _* j • * • j * *u *u * 

, put r „y lt in r6 multiple read ports and an instruction issue data path that can 

st.post r6, r5, $4 i store r6 to memory pointed to by r5 40 be fed either from a normal memory fetch path or from the 

! add 4 to r5 loop buffer 40 under control of either the vdo control 30 or 

} the instruction decode and dispatch unit 20. The overall 

structure of the instruction path of the processor 10 of FIG. 

In the vector zero overhead loop above the do instruction 1 is illustrated generally in FIG. 2. 

was changed to a vdo instruction and the result of the 45 When the instruction decode and dispatch unit 20 detects 

multiply was placed into register r6 instead of register r2 a vdo opcode, sequencers SEQO 32 and SEQ 1 34 are 

since r2 was used to store the result of the first load. initialized and iteration counter 60 is set to zero. The number 

In accordance with the present invention, the apparatus of iterations (num_iter) register 61 is set to the vdo 

must insure that each result register is read by its consumer argument, i.e., the number of iterations to be performed in 

before the next iteration overwrites it. FIG. 1 illustrates in 50 the loop. As the vdo code is fetched, it is written to the loop 

block diagram form the major data paths and control logic buffer 40 via line 64. The SEQO 32 sequencer executes the 

for a processor 10 which is capable of performing multiple first iteration of the loop by setting its loop program counter 

iterations at once and ensuring that the result registers are (LPQ 66 to the top of the loop and fetching the instruction, 

not overwritten before they are read in accordance with the ^fter SEQO 32 issues its first instruction, which may pos- 

present invention. Processor 10 includes several functional 55 si °ly be a multi-issue packet of instructions, SE0 32 sets 

units as are known in the art, such as, for example, a LPC 66 to the next instruction of the loop and repeats the 

load/store (LDST) functional unit 12a, a Multiply/ process. After SEQO 32 has issued its first instructions, 

Accumulate (MAC) functional unit 126, and a Shifter func- SE Q! 34 >s enabled to begin fetching from the loop buffer 

tional unit 12c. A register by-pass network 14, as is known 40 (top of the loop). As SEQ1 fetches from the loop buffer 

in the art, enables the result from a functional unit 12 to be 60 40, it sets its LPC 68 to the next instruction and repeats the 

used as an operand by the same or a different functional unit fetch/issue process. 

12 in a succeeding cycle. Thus, by utilizing the register The loop buffer 40 has two read port address lines 70, 72. 

by-pass network 14, instruction M of iteration N+l can issue Read port address line 70 is used for sequencer SEQO 32 and 

in parallel with instruction M+l of iteration N since reads also control of a normal do loop, while read port address line 

are done at the beginning of the instruction execute cycle 65 72 is used for sequencer SEQ1 34, Each read port address 

and writes are done at the end of the cycle, possibly in a line 70, 72 may be multiple instructions wide, depending 

different pipe stage. A register file 16 is provided to store upon the degree of multiple issue supported. The dual issue 
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case of FIG. 2 is shown for illustrative purposes only, and tion. Instructions may issue one at a time or in a group as a 

the invention need not be so limited. The loop buffer 40 is multiple issue packet. If the instruction is not a vdo 

written from the instruction stream fetched by the normal instruction, the vdo sequencers 32, 34 remain idle. If the 

fetch path 74 via line 64. instruction is a vdo instruction, one of the sequencers, such 

Each sequencer 32, 34 includes a multiple issue packet 5 as for example sequencer 32, will initialize several registers 

number (mipn) register 35 and a sequence iteration Count on behalf of all the instruction sequencers in step 120. In step 

(seq_iter_cnt) register 36. A multiple issue packet number, 120 » tne value of a number of instructions register 62 

hereinafter which may also be referred to simply as a packet (num_instr) is set to the vdo instruction count. Additionally, 

number, is a unique number that is assigned to each instruc- tne value of a number of iterations register 61 (num_iter) is 

tion in a given multi-issue packet as it is stored into the loop 10 set t0 the y do iteration count, and an iteration counter 

buffer 40. Each sequencer 32, 34 begins its fetch/issue (iter_cnt) 60 is set to zero. Only sequencer 32 (SEQ0) 

process by checking, when enabled by all sequencers pro- performs the operations specified in step 120. In step 130, 

cessing preceding iterations, if the iteration counter 60 loop program counter (lpc) 66 is set to zero, and a multiple 

indicates any unexecuted iterations. Thus for example, if issue packet number (mipn) register 35 is set to zero, 

sequencer 32 initiates execution of the loop, sequencer 32 15 m step 140, it is determined if the value of the iteration 

will copy the current value of the iteration counter 60 to its counter (iter_cnt) 60 is equal to the value in the number of 

sequence iteration count (seq_iter„cnt) register 36 and then iterations (num_iter) register 61. If the value of the iteration 

set iteration counter 60 to the next iteration. Thus, iteration counter (itcr_cnt) 60 equals the value of the number of 

counter 60 will count the number of iterations performed by iterations (num„_itcr) register 61, all instructions in the loop 

both sequencers 32, 34. Sequencer 32 will use the sequence 20 have been executed and the sequencers 32, 34 return to an 

iteration count (seq_Jter_cnt) register 36 value for resolu- idle state m ste P 100 If lhe value of the iteration counter 

tion of functional unit usage conflicts with the other (iter_cnt) 60 is not equal to the value of the number of 

sequencers, such as sequencer 34, When the sequencers 32, iterations (num_iter) register 61, i.e., there are still instruc- 

34 complete the instructions in the loop, the iteration counter tions lefl to execute, it is next determined if an enable from 

60 is checked and the sequencers 32, 34 will continue 25 a previous sequencer SQ^ has been asserted in step 150. 

processing if any iterations remain. Th e enable for SEQ0 32 is always asserted, as SEQ0 32 

Referring back to the previous example of the conven- l^TV*? fllSt . SEQ1 ? 4 Wi " determine if 

tional zero overhead loop, the code can be issued as follows: SEQ0 32 h f r enabled SEQ1 34 to fetch and execute an 

instruction. If the enable has not been asserted, the loop 
ld.post rz, ri, 5>4 30 p r0 g ram counter (lpc) 66 is set to zero, and the multiple issue 

mul r2, rO, r2 packet number (mipn) register35 is set to zero again in step 

st.post r2, r5, $4 30, If the enable has been asserted, sequence iteration count 

ld.post r2, r3, S4 (seq__iter__cnt) register 36 is set to the value of the iteration 

mul r2 rO r2 counter (iter_cnt) 60 and the iteration counter (iter_cnt) 60 

35 is incremented in step 160. Additionally, in step 160, the 
st.post r , r5, S> enable to the succeeding sequencer, SEQ I+a , is enabled, 

ld.post r2, ri3, $4 j n s t e p jjq^ eacn respective sequencer retrieves the next 

mul r2, rO, r2 instruction of the current multiple issue packet number 

st.post r2, r5, $4 (mipn) for that sequencer. In step 180, it is determined if the 

Id post r2 r3 $4 40 va l ue m me sequence iteration count (seq_iter_cnt) register 

mul r2 rO r2 ^ ^ or ^ at ^t^ 00 ^ ^ equal to the minimum value of the 

' ' sequencer iteration count (seq__iter_cnt) register 36 for all 

st.post r , r , $ sequencers 32, 34. If the sequence iteration count of that 

The instructions in the above loop will take twelve cycles, « 1 in tUa • , f r „ 

„ , - . r ... . J sequencer is equal to the minimum value 01 the sequence 

one for each line of code to execute , utilizing a prior art zero 4J i(eration C0UQt {se<i _j let _ CQit) for aU 32 , 34, in 

overhead loop. In accordance with the present uivenuon, no fa determined if the next inslniclion or 

utilizing two sequencers such as sequencers 32, 34 of FIGS. , ions of the muUi le ^ ke , fof , hat xqwactI are 
1 and 2, the instructions require only seven cycles to ready forissue . if they are not ready for issue, that sequencer 
execute, cutting the execution time from twelve cyclesdown wU1 wait umU th afe read for ^ 0nce , be next 
to seven cycles Utilizing a two sequencer vec or zero 5Q instruction ^ read for ^ (iY£s se in st 2 10), 
overhead loop the instructions will be issued as follows: , he inslnictions wiu be the loop program counter 

(lpc) will be incremented, and the multiple issue packet 
number (mipn) register 35 will be set to a value indicating 
the number of the multi-issue packet corresponding to the 
55 number of the instruction indicated by the loop program 
counter (lpc) for that instruction sequencer in step 220. In 
step 230, it is determined if the loop program counter (lpc) 
is equal to the value in the number of instructions (num„ 
instr) register 62. If they are equal, the method will return to 
60 step 130 and continue processing. If the loop program 
counter (lpc) is not equal to the value in the number of 
instructions (num_Jnstr) register 62, the method wiU return 
FIG. 3 illustrates in flow chart form the method for issuing to step 170 and continue processing, 
instructions followed by each sequencer 32, 34 of the vdo If the sequence iteration count (seq^-^^cnt) for that 
control 30 in accordance with the present invention. In step 65 sequencer is not equal to the minimum value of the sequence 
100, each sequencer 32, 34 is idle. In step 110, it is iteration count (seq_iter_cnt) for all sequencers (a NO 
determined if an instruction to be executed is a vdo instruc- response in step 180), it is determined if the multiple issue 
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packet number (mipn) of the sequencer with the next lower 
sequence iteration count (seq_iter_cnt) is greater than the 
multiple issue packet number (mipn) +1, i.e., the number of 
the next packet, of that sequencer in step 190. If the multiple 
issue packet number (mipn) of the sequencer with the next 5 
lower sequence iteration count (seq„iter_cnt) is greater 
than the multiple issue packet number (mipn) +1 of that 
sequencer (a YES response in step 190), it is determined if 
the next instruction or instructions are ready for issue in step 
210 as described above. If the multiple issue packet number 10 
(mipn) of the sequencer with the next lower sequence 
iteration count (seq_jter„_cnt) is not greater than the mul- 
tiple issue packet number (mipn) +1 of that sequencer (a NO 
response in step 190), it is determined in step 200 if the 
sequencer with the next lower sequence iteration count 15 
(seq iter_cnt) is issuing the last instruction in that sequenc- 
er's multiple issue packet number (mipn) +1 packet. If the 
sequencer with the next lower sequence iteration count 
(seq_Jter_cnt) is issuing the last instruction in that sequenc- 
er's multiple issue packet number (mipn) +1 packet (a YES 
response in step 200), it is determined if the next instruction 
or instructions are ready for issue in step 210 as described 
above. If the sequencer with the next lower sequence itera- 
tion count (seq_iter_cnt) is not issuing the last instruction 
in the that sequencer's multiple issue packet number (mipn) 
+1 packet (a NO response in step 200), the method returns 
to step 180 for continued processing. 

Thus, in accordance with the present invention, the speed 
of processing data vectors is increased by forming the loop 
instructions as producer-consumer instructions and utilizing 
more than one sequencer to allow for the start of an iteration 
N+l of a program loop before iteration N of the program 
loop is completed. 

FIG. 4A illustrates an example of a code sequence includ- 
ing a multiple issue packet with the corresponding loop 35 
program counter (lpc) value and mipn for each line. The 
instruction sequence shown in FIG. 4A computes the prod- 
uct of two vectors located in memory at the addresses 
indicated by the initial values of rl and r3. The product 
vector is stored into memory at the address indicated by the 40 
initial value of r5. The code includes four instructions, two 
of which are grouped into a single multiple issue packet. 

FIG. 4B illustrates in table form the behavior of the 
sequencers 32, 34 when processing the code shown in FIG. 
4A. The table of FIG. 4B illustrates the state sequence and 45 
changing register values as each sequencer's 32, 34 state 
machine processes the example loop. The state transitions 
shown in the table of FIG. 4B and in the flow chart of FIG. 
3 are the logical steps of the method and need not occur on 
clock cycle boundaries. The upper part of the table shows the 
first fourteen state transitions. The lower part of the table 
shows the remaining state transitions. The table cells indi- 
cate the value of the register named for the corresponding 
row in each sequencer. The global register values (num_ 
instr, num_iter, iter_cnt) are only shown when they change. 
Register values that change are shown with their new values 
in the state where they are modified. The value x indicates 
the value prior to any initialization by the state machine. The 
initialization of the global registers to the values given in the 
instruction is only performed by sequencer SEQO 32. 60 

Reference has been made to a preferred embodiment in 
describing the invention. However, additions, deletions, 
substitutions, or other modifications which would fall within 
the scope of the invention defined in the claims may be 
implemented by those skilled in the art and familiar with the 65 
disclosure of the invention without departing from the spirit 
or scope of the invention. Also, although the invention is 



50 



55 



preferably implemented in hardware, it may be implemented 
in hardware, software, or any combination of the two. All are 
deemed equivalent with respect to the operation of the 
invention. Accordingly, the invention is not to be considered 
as limited by the foregoing description, but is only limited 
by the scope of the appended claims. 

What is claimed as new and desired to be protected by 
Letters Patent of the United States is: 

1. A processor for processing vector data, said processor 
comprising: 

a plurality of functional units; 

a register by-pass network connected to each of said 
plurality of functional units, said register by-pass net- 
work allowing a result produced by one of said plural- 
ity of functional units to be used as an operand by said 
one of said plurality of functional units or another of 
said plurality of functional units in an immediate suc- 
ceeding cycle; 

a register file connected to said register by-pass network, 
said register file adapted to store a result produced by 
each of said plurality of functional units; 

an instruction decode and dispatch unit connected to said 
register file and each of said plurality of functional 
units; and 

a vector zero overhead loop control circuit connected to 
said instruction decode and dispatch unit to receive 
instructions from said instruction decode and dispatch 
unit, said vector zero overhead loop control circuit 
comprising: 

a plurality of instruction sequencers, one of said plurality 
of instruction sequencers being adapted to start itera- 
tion N+l of a program loop before iteration N of said 
program loop is completed by another of said plurality 
of instruction sequencers; and 

an iteration counter connected to said plurality of instruc- 
tion sequencers, said iteration counter being adapted to 
store a value representing a number of iterations that 
have been processed in whole or in part by said 
plurality of instruction sequencers. 

2. The processor according to claim 1, wherein said 
plurality of instruction sequencers further comprises: 

a first instruction sequencer connected to a loop buffer; 
and 

a second instruction sequencer connected to said first 
instruction sequencer and said loop buffer, 

wherein said program loop is written to said loop buffer 
and after said first instruction sequencer executes a first 
iteration of said program loop and issues a first 
instruction, said second instruction sequencer is 
enabled to fetch instructions from said loop buffer. 

3. The processor according to claim 1, wherein said 
plurality of functional units includes a shifter. 

4. The processor according to claim 2, wherein said zero 
vector overhead loop control circuit further comprises: 

a first register connected to said first and second instruc- 
tion sequencers, said first register being adapted to 
store a value representing a total number of iterations of 
said program loop performed by said first and second 
instruction sequencers. 

5. The processor according to claim 4, wherein said zero 
vector overhead loop control circuit further comprises: 

a second register connected to said first and second 
instruction sequencers, said second register being 
adapted to store a value representing a total number of 
instructions of said program loop performed by said 
first and second instruction sequencers. 
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6. The processor according to claim 5, wherein said zero 
vector overhead loop control circuit further comprises: 

a first loop program counter connected between said first 
instruction sequencer and said loop buffer to count a 
number of instructions of said program loop executed 5 
by said first instruction sequencer; and 

a second loop program counter connected between said 
second instruction sequencer and said loop buffer to 
count a number of instructions of said program loop 
executed by said second instruction sequencer. 10 

7. The processor according to claim 6, wherein each of 
said first and second instruction sequencers farther com- 
prise: 

a third register adapted to store a value representing said 
number of iterations from said iteration counter. 15 

8. The processor according to claim 7, wherein each of 
said first and second instruction sequencer further com- 
prises: 

a fourth register adapted to store a value representing a 
multi-issue packet number corresponding to an instruc- 20 
tion indicated by said first and second loop program 
counter respectively. 

9. The processor according to claim 1, wherein said 
plurality of functional units includes a load/store functional 
unit. 25 

10. The processor according to claim 1, wherein said 
plurality of functional units includes a multiply/accumulate 
functional unit. 

11 . A processor for processing a data vector, said proces- 
sor comprising: 30 

a plurality of instruction sequencers, each of said instruc- 
tion sequencers being adapted to execute a succeeding 
loop iteration of a vector zero overhead loop before a 
preceding loop iteration has completed execution by ^ 
another of said plurality of instruction sequencers; and 

an iteration counter connected to said plurality of instruc- 
tion sequencers, said iteration counter being adapted to 
store a value representing a number of iterations that 
have been processed in whole or in part by said 4Q 
plurality of instruction sequencers. 

12. The processor according to claim 11, further compris- 
ing: 

a first register connected to said plurality of instruction 
sequencers, said first register being adapted to store a 45 
value representing a total number of iterations of said 
vector zero overhead loop performed by said plurality 
of instruction sequencers. 

13. The processor according to claim 12, further com- 
prising: 50 

a second register connected to said plurality of instruction 
sequencers, said second register being adapted to store 
a value representing a total number of instructions of 
said vector zero overhead loop performed by said 
plurality of instruction sequencers. 55 

14. The processor according to claim 13, further com- 
prising: 

a plurality of loop program counters, each of said plurality 
of loop program counters being connected between a 
respective one of said plurality of instruction sequenc- 60 
ers and a loop buffer. 

15. The processor according to claim 14, wherein each of 
said plurality of instruction sequencers further comprise: 

a third register adapted to store a value representing said 
number of iterations from said iteration counter. 65 

16. The processor according to claim 15, wherein each of 
said plurality of instruction sequencers further comprises: 
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a fourth register adapted to store a value representing a 
multi-issue packet number corresponding to an instruc- 
tion indicated by a respective one of said loop program 
counters. 

17. The processor according to claim 11, wherein said 
processor includes a digital signal processor. 

18. The processor according to claim 11, wherein said 
processor includes a microprocessor. 

19. A method for processing a data vector in a processor, 
said method comprising the steps of: 

receiving instructions to process said data vector as a 
vector zero overhead loop; 

executing a loop iteration N of said vector zero overhead 
loop with one of a plurality of instruction sequencers 
and a loop iteration N+l of said vector zero overhead 
loop with another of said plurality of instruction 
sequencers before said loop iteration N is finished; and 

counting a number of iterations that have been executed 
by said plurality of instruction sequencers. 

20. The method according to claim 19, wherein said 
receiving step further comprises: 

receiving said instructions to process said data vector that 
are encoded as producer-consumer instructions. 

21. The method according to claim 19, further comprising 
the steps of: 

setting a number of instructions register equal to an 
instruction count of said vector zero overhead loop; 

setting a number of iterations register equal to an iteration 
count specified by said vector zero overhead loop; and 

setting an iteration counter to zero. 

22. The method according to claim 21, further comprising 
the steps of: 

setting a loop program counter to zero; and 

setting a multiple issue packet number register to zero. 

23. The method according to claim 22, further comprising 
the step of: 

determining if a number of iterations indicated by said 
iteration counter is equal to a value in said number of 
iterations register, 

24. The method according to claim 23, wherein if an 
instruction sequencer determines said number of iterations 
indicated by said iteration counter is equal to a value in said 
number of iterations register, said method further comprises: 

making said instruction sequencer idle until a next vector 
zero overhead loop is received. 

25. The method according to claim 23, wherein if an 
instruction sequencer determines said number of iterations 
indicated by said iteration counter is not equal to a value in 
said number of iterations register, said method further com- 
prises: 

determining if said instruction sequencer has been 
enabled by a preceding instruction sequencer. 

26. The method according to claim 25, wherein if said 
instruction sequencer has not been enabled, said method 
further comprises: 

repeating said steps of setting said loop program counter 
to zero, setting said multiple issue packet number 
register to zero, and determining if a number of itera- 
tions indicated by said iteration counter is equal to a 
value in said number of iterations register step and said 
determining if said instruction sequencer has been 
enabled. 

27. The method according to claim 25, wherein if said 
instruction sequencer has been enabled, said method further 
comprises: 
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setting a value of a sequence iteration count register of iteration count is greater than a next packet number of said 

said instruction sequencer equal to said iteration instruction sequencer, said method further comprises: 

counter; determining if said next instruction is ready for issue, 

incrementing said iteration counter; and 36 - rrhe method according to claim 35, wherein if said 

... 5 next instruction is not ready for issue, said method further 

enabling a succeeding sequencer. . 3 

. comprises* 

28. The method according to claim 27, further comprising . . , . 

the ste s of* waiting until said next instruction is ready for issue. 

" . . , 37. The method according to claim 35, wherein if said 

retrieving a next instruction to be executed by said next mstruct ion is ready for issue, said method further 

instruction sequencer; and ]Q compris e S ; 

determining if said value of said sequence iteration count issuing said next instruction from said instruction 

register of said instruction sequencer is equal to a sequencer; 

minimum sequence iteration count of said plurality of incrementing said loop program counter; 

instruction sequencers. setting said multiple issue packet number register to a 

29. The method according to claim 28, wherein if said i • j* «• u c — w ■ i. 

. & . ' . r-i 15 value indicating a number of a multi-issue packet 

value of said sequence iteration count register of said corresponding t0 a number of an instruction indicated 

instruction sequencer is equal to a minimum sequence J A lo e M and 

iteration count of said plurality of instruction sequencers, , . . . r . , . 

■j .1. j t ■ determining if a value of said loop program counter is 

said method further comprises: . & . . , . 

. . , c . equal to a value in said number of instructions register. 

determining if said next instruction is ready for issue. 20 3g The method accordi to claim 37 wherein if said 

30. The method according to claim 29 wherein if said yaluc of sakJ b counter is nQt { [Q said yalue 
next instruction is not ready for issue, said method further ^ ^ number of instructions registerj said method furthe r 
comprises: comprises: 

waiting until said next instruction is ready for issue. repeating said steps of retrieving a next instruction to be 

31. The method according to claim 29, wherein if said 25 executed by said instruction sequencer, and determin- 
next instruction is ready for issue, said method further ing - f said value of said sequence iteration count 
comprises: register of said instruction sequencer is equal to a 

issuing said next instruction from said instruction minimum sequence iteration count of said plurality of 

sequencer; instruction sequencers, 

incrementing said loop program counter; 30 39. The method according to claim 37, wherein if said 

setting said multiple issue packet number register to a value of said loop program counter is equal to said value in 

value indicating a number of a multi-issue packet said number of instructions register, said method further 

corresponding to a number of an instruction indicated comprises: 

by said loop program counter; and repeating said steps of setting said loop program counter 

determining if a value of said loop program counter is to zero, setting said multiple issue packet number 

equal to a value in said number of instructions register. register number to zero, and determining if a number of 

32. The method according to claim 31, wherein if said iterations indicated by said iteration counter is equal to 
value of said loop program counter is not equal to said value a value in said number of iterations register. 

in said number of instructions register, said method further 40. The method according to claim 34, wherein if said 

comprises: packet number of an instruction sequencer with a next lower 

repeating said steps of retrieving a next instruction to be sequence iteration count is not greater than a next packet 

executed by said instruction sequencer and determining number of said "^ruction sequencer, said method further 

if said value of said sequence iteration count register of comprises: 

said instruction sequencer is equal to a minimum 45 determining if said instruction sequencer with a next 

sequence iteration count of said plurality of instruction lower sequence iteration count is issuing a last instruc- 

sequencers. uon m sa ^ next packet number of said instruction 

33. The method according to claim 31, wherein if said sequencer. 

value of said loop program counter is equal to said value in 41. The method according to claim 40, wherein if said 

said number of instructions register, said method further 50 instruction sequencer with a next lower sequence iteration 

comprises: count is issuing a last instruction in said next packet number 

repeating said steps of setting said loop program counter of said instruction sequencer, said method further comprises: 

to zero, setting said multiple issue packet number determining if said next instruction is ready for issue, 

register number to zero, and determining if a number of 42. The method according to claim 41, wherein if said 

iterations indicated by said iteration counter is equal to 55 next instruction is not ready for issue, said method further 

a value in said number of iterations register. comprises: 

34. The method according to claim 28, wherein if said waiting until said next instruction is ready for issue, 
sequence iteration count of said instruction sequencer is not 43. The method according to claim 41, wherein if said 
equal to a minimum sequence iteration count of said phi- next instruction is ready for issue, said method further 
rality of instruction sequencers, said method further com- $q comprises: 

prises: issuing said next instruction from said instruction 

determining if a packet number of an instruction sequencer; 

sequencer with a next lower iteration count is greater incrementing said loop program counter; 

than a next packet number of said instruction setting said multiple issue packet number register to a 

sequencer. 65 value indicating a number of a multi-issue packet 

35. The method according to claim 34, wherein if said corresponding to a number of an instruction indicated 
packet number of an instruction sequencer with a next lower by said loop program counter; and 
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determining if a value of said loop program counter is 
equal to a value in said number of instructions register. 

44. The method according to claim 43, wherein if said 
value of said loop program counter is not equal to said value 
in said number of instructions register, said method further 
comprises: 

repeating said steps of retrieving a next instruction to be 
executed by said instruction sequencer, and determin- 
ing if said value of said sequence iteration count 
register of said instruction sequencer is equal to a 
minimum sequence iteration count of said plurality of 
instruction sequencers. 

45. The method according to claim 43, wherein if said 
value of said loop program counter is equal to said value in 
said number of instructions register, said method further 15 
comprises: 
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repeating said steps of setting said loop program counter 
to zero, setting said multiple issue packet number 
register number to zero, and determining if a number of 
iterations indicated by said iteration counter is equal to 
a value in said number of iterations register. 
46. The method according to claim 40, wherein if said 
instruction sequencer with a next lower sequence iteration 
count is not issuing a last instruction in said next packet 
number of said instruction sequencer, said method further 
comprises 

repeating said step of determining if said sequence itera- 
tion count of said instruction sequencer is equal to a 
minimum sequence iteration count of said plurality of 
instruction sequencers. 
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